Automatic Transliteration of Judeo-Arabic Texts into Arabic Script

نویسنده

  • Kfir
چکیده

! The Judeo-Arabic languages comprise a set of dialects spoken and written by Jewish communities living in Arab countries, mainly during the middle ages. Judeo-Arabic is typically written in Hebrew letters, enriched with various diacritic marks. The Judeo-Arabic spoken and written by any particular Jewish community is similar to the Arabic dialect used by their local Muslim community. In addition, Judeo-Arabic dialects borrow words from Aramaic and Hebrew, sometimes modified according to Arabic morphological rules. Since the Arabic alphabet is larger than the Hebrew one, additional diacritic marks are added to some Hebrew letters when rendering Arabic consonants that are lacking in the Hebrew alphabet. Judeo-Arabic authors often use different letters and diacritic marks to represent the same Arabic consonant. For example, some authors use ‫ג‬ to represent " and ‫ג׳‬ to represent $, while others reverse the two. This inconsistency increases the level of ambiguity of a given word, making the reading of Judeo-Arabic texts a challenging task even for an Arabic speaker. Such inconsistencies may be observed even in the same document. For instance, the letter ‫י‬ sometimes represents the letter &, such as in the word ‫פי‬ (" in " , ‫,)في‬ sometimes represents the letter (, such as in the word ‫סילת‬ (" I was asked " , (" on " / " to " / " at " , in Arabic ‫.)على‬ Currently, many works in Judeo-Arabic are being made available on the web. However, most Arabic speakers are unfamiliar with the Hebrew script, let alone the way it is used to render Judeo-Arabic. Therefore, there is a crucial need for automatic tools capable of transliterating Judeo-Arabic texts in Arabic letters. Since Judeo-Arabic texts usually contain some non-Arabic words, transliteration is only one step toward providing a full Arabic translation for a given Judeo-Arabic input. We are focusing mainly on the transliteration process, leaving Hebrew and Aramaic words in their original Hebrew script. As mentioned above, Judeo-Arabic is not a single language, but rather a set of dialects, each used by the local Jewish community in some of the Arab countries. Some texts are similar to Classical Arabic, the ancestor of the Modern Standard Arabic (MSA), which is widely used today in formal settings, while other texts are more similar to local Muslim dialects. We focus, for now, on the Judeo-Arabic version that is similar to MSA more than on the colloquial versions. We model …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Transliteration of Judeo-Arabic Texts

Judeo-Arabic is a group of Arabic-based languages used by Jews for many centuries, which like some other Jewish languages (for example,Yiddish), is written in the Hebrew alphabet. Many of the great Jewish literary works of the Middle Ages were written in Judeo-Arabic. A large quantity of additional Judeo-Arabic text has become available with the digitization of manuscripts found in the Cairo Ge...

متن کامل

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

Using Transliteration of Proper Names from Arabic to Latin Script to Improve English-Arabic Word Alignment

Bilingual lexicons of proper names play a vital role in machine translation and cross-language information retrieval. Word alignment approaches are generally used to construct bilingual lexicons automatically from parallel corpora. Aligning proper names is a task particularly difficult when the source and target languages of the parallel corpus do not share a same written script. We present in ...

متن کامل

Study of the impact of proper name transliteration on the performance of word alignment in French-Arabic parallel corpora (Etude de l'impact de la translittération de noms propres sur la qualité de l'alignement de mots à partir de corpus parallèles français-arabe) [in French]

Bilingual lexicons play a vital role in cross-language information retrieval and machine translation. The manual construction of these lexicons is often costly and time consuming. Word alignment techniques are generally used to construct bilingual lexicons from parallel texts. Aligning single words and nominal syntagms from parallel texts is relatively a well controlled task for languages using...

متن کامل

Revised Proposal to Encode Gujarati Signs for the Transliteration of Arabic Anshuman Pandey

The proposed signs are used for the transliteration of the Arabic script into Gujarati by Ismaili Khoja communities. They are used for representing Arabic letters and signs for which correspondences do not exist in Gujarati. They were devised in the late 19th century and are standard elements of the Gujarati orthography used by the Ithnashari Khoja (“Twelver Shia”) and the Agakhani Khoja commun...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014